

<!DOCTYPE html>
<html class="writer-html5" lang="zh" >
<head>
  <meta charset="utf-8">
  
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
  <title>Scrapy 2.3 documentation &mdash; Scrapy 2.3.0 文档</title>
  

  
  <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
  <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster.custom.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster.bundle.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster-sideTip-shadow.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster-sideTip-punk.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster-sideTip-noir.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster-sideTip-light.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/tooltipster-sideTip-borderless.min.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/micromodal.css" type="text/css" />
  <link rel="stylesheet" href="_static/css/sphinx_rtd_theme.css" type="text/css" />

  
  
  
  

  
  <!--[if lt IE 9]>
    <script src="_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
    
      <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
        <script src="_static/jquery.js"></script>
        <script src="_static/underscore.js"></script>
        <script src="_static/doctools.js"></script>
        <script src="_static/language_data.js"></script>
        <script src="_static/js/hoverxref.js"></script>
        <script src="_static/js/tooltipster.bundle.min.js"></script>
        <script src="_static/js/micromodal.min.js"></script>
    
    <script type="text/javascript" src="_static/js/theme.js"></script>

    
    <link rel="index" title="索引" href="genindex.html" />
    <link rel="search" title="搜索" href="search.html" />
    <link rel="next" title="Scrapy一目了然" href="intro/overview.html" /> 
</head>

<body class="wy-body-for-nav">

   
  <div class="wy-grid-for-nav">
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >
          

          
            <a href="#" class="icon icon-home" alt="Documentation Home"> Scrapy
          

          
          </a>

          
            
            
              <div class="version">
                2.3
              </div>
            
          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <p class="caption"><span class="caption-text">第一步</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="intro/overview.html">Scrapy一目了然</a></li>
<li class="toctree-l1"><a class="reference internal" href="intro/install.html">安装指南</a></li>
<li class="toctree-l1"><a class="reference internal" href="intro/tutorial.html">Scrapy 教程</a></li>
<li class="toctree-l1"><a class="reference internal" href="intro/examples.html">实例</a></li>
</ul>
<p class="caption"><span class="caption-text">基本概念</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="topics/commands.html">命令行工具</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/spiders.html">蜘蛛</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/selectors.html">选择器</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/items.html">项目</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/loaders.html">项目加载器</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/shell.html">Scrapy shell</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/item-pipeline.html">项目管道</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/feed-exports.html">Feed 导出</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/request-response.html">请求和响应</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/link-extractors.html">链接提取器</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/settings.html">设置</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/exceptions.html">例外情况</a></li>
</ul>
<p class="caption"><span class="caption-text">内置服务</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="topics/logging.html">登录</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/stats.html">统计数据集合</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/email.html">发送电子邮件</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/telnetconsole.html">远程登录控制台</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/webservice.html">Web服务</a></li>
</ul>
<p class="caption"><span class="caption-text">解决具体问题</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="faq.html">常见问题</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/debug.html">调试spiders</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/contracts.html">蜘蛛合约</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/practices.html">常用做法</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/broad-crawls.html">宽爬行</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/developer-tools.html">使用浏览器的开发人员工具进行抓取</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/dynamic-content.html">选择动态加载的内容</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/leaks.html">调试内存泄漏</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/media-pipeline.html">下载和处理文件和图像</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/deploy.html">部署蜘蛛</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/autothrottle.html">AutoThrottle 扩展</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/benchmarking.html">标杆管理</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/jobs.html">作业：暂停和恢复爬行</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/coroutines.html">协同程序</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/asyncio.html">asyncio</a></li>
</ul>
<p class="caption"><span class="caption-text">扩展Scrapy</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="topics/architecture.html">体系结构概述</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/downloader-middleware.html">下载器中间件</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/spider-middleware.html">蜘蛛中间件</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/extensions.html">扩展</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/api.html">核心API</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/signals.html">信号</a></li>
<li class="toctree-l1"><a class="reference internal" href="topics/exporters.html">条目导出器</a></li>
</ul>
<p class="caption"><span class="caption-text">其余所有</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="news.html">发行说明</a></li>
<li class="toctree-l1"><a class="reference internal" href="contributing.html">为 Scrapy 贡献</a></li>
<li class="toctree-l1"><a class="reference internal" href="versioning.html">版本控制和API稳定性</a></li>
</ul>

            
          
        </div>
        
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="#">Scrapy</a>
        
      </nav>


      <div class="wy-nav-content">
        
        <div class="rst-content">
        
          















<div role="navigation" aria-label="breadcrumbs navigation">

  <ul class="wy-breadcrumbs">
    
      <li><a href="#" class="icon icon-home"></a> &raquo;</li>
        
      <li>Scrapy 2.3 documentation</li>
    
    
      <li class="wy-breadcrumbs-aside">
        
            
        
      </li>
    
  </ul>

  
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
  <div class="section" id="scrapy-version-documentation">
<span id="topics-index"></span><h1>Scrapy 2.3 documentation<a class="headerlink" href="#scrapy-version-documentation" title="永久链接至标题">¶</a></h1>
<p>Scrapy 是一种快速的高级 <a class="reference external" href="https://en.wikipedia.org/wiki/Web_crawler">web crawling</a> 和 <a class="reference external" href="https://en.wikipedia.org/wiki/Web_scraping">web scraping</a> 框架，用于对网站进行爬网并从其页面提取结构化数据。它可以用于广泛的用途，从数据挖掘到监控和自动化测试。</p>
<div class="section" id="getting-help">
<h2>得到帮助<a class="headerlink" href="#getting-help" title="永久链接至标题">¶</a></h2>
<p>有麻烦吗？我们想帮忙！</p><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle"
     style="display:block; text-align:center;"
     data-ad-layout="in-article"
     data-ad-format="fluid"
     data-ad-client="ca-pub-1466963416408457"
     data-ad-slot="8850786025"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>
<ul class="simple">
<li><p>试试 <a class="reference internal" href="faq.html"><span class="doc">FAQ</span></a> --它有一些常见问题的答案。</p></li>
<li><p>寻找具体信息？试试 <a class="reference internal" href="genindex.html"><span class="std std-ref">索引</span></a> 或 <a class="reference internal" href="py-modindex.html"><span class="std std-ref">模块索引</span></a> .</p></li>
<li><p>使用scrapy标签`_在StackOverflow中提问或搜索问题。</p></li>
<li><p>在“Scrapy subreddit”中询问或搜索问题。</p></li>
<li><p>搜索`scrapy-users邮件列表`_的档案问题。</p></li>
<li><p>在`#scrapy IRC channel`_ 中提问,</p></li>
<li><p>在我们的“问题跟踪器”中用Scrapy报告错误。</p></li>
</ul>
</div>
<div class="section" id="first-steps">
<h2>第一步<a class="headerlink" href="#first-steps" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="intro/overview.html"><span class="doc">Scrapy一目了然</span></a></dt><dd><p>了解 Scrapy 是什么以及它如何帮助你。</p>
</dd>
<dt><a class="reference internal" href="intro/install.html"><span class="doc">安装指南</span></a></dt><dd><p>在你的电脑上安装Scrapy。</p>
</dd>
<dt><a class="reference internal" href="intro/tutorial.html"><span class="doc">Scrapy 教程</span></a></dt><dd><p>写你的第一个 Scrapy 项目。</p>
</dd>
<dt><a class="reference internal" href="intro/examples.html"><span class="doc">实例</span></a></dt><dd><p>通过玩预先制作的零碎项目了解更多信息。</p>
</dd>
</dl>
</div>
<div class="section" id="basic-concepts">
<span id="section-basics"></span><h2>基本概念<a class="headerlink" href="#basic-concepts" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="topics/commands.html"><span class="doc">命令行工具</span></a></dt><dd><p>了解用于管理零碎项目的命令行工具。</p>
</dd>
<dt><a class="reference internal" href="topics/spiders.html"><span class="doc">蜘蛛</span></a></dt><dd><p>编写规则以对网站进行爬网。</p>
</dd>
<dt><a class="reference internal" href="topics/selectors.html"><span class="doc">选择器</span></a></dt><dd><p>使用xpath从网页中提取数据。</p>
</dd>
<dt><a class="reference internal" href="topics/shell.html"><span class="doc">Scrapy shell</span></a></dt><dd><p>在交互式环境中测试提取代码。</p>
</dd>
<dt><a class="reference internal" href="topics/items.html"><span class="doc">项目</span></a></dt><dd><p>定义要擦除的数据。</p>
</dd>
<dt><a class="reference internal" href="topics/loaders.html"><span class="doc">项目加载器</span></a></dt><dd><p>用提取的数据填充项目。</p>
</dd>
<dt><a class="reference internal" href="topics/item-pipeline.html"><span class="doc">项目管道</span></a></dt><dd><p>后处理和存储您的抓取数据。</p>
</dd>
<dt><a class="reference internal" href="topics/feed-exports.html"><span class="doc">Feed 导出</span></a></dt><dd><p>使用不同的格式和存储输出抓取的数据。</p>
</dd>
<dt><a class="reference internal" href="topics/request-response.html"><span class="doc">请求和响应</span></a></dt><dd><p>了解用于表示HTTP请求和响应的类。</p>
</dd>
<dt><a class="reference internal" href="topics/link-extractors.html"><span class="doc">链接提取器</span></a></dt><dd><p>方便的类从页面中提取要跟踪的链接。</p>
</dd>
<dt><a class="reference internal" href="topics/settings.html"><span class="doc">设置</span></a></dt><dd><p>了解如何配置Scrapy并查看所有 <a class="reference internal" href="topics/settings.html#topics-settings-ref"><span class="std std-ref">available settings</span></a> .</p>
</dd>
<dt><a class="reference internal" href="topics/exceptions.html"><span class="doc">例外情况</span></a></dt><dd><p>查看所有可用的异常及其含义。</p>
</dd>
</dl>
</div>
<div class="section" id="built-in-services">
<h2>内置服务<a class="headerlink" href="#built-in-services" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="topics/logging.html"><span class="doc">登录</span></a></dt><dd><p>了解如何在Scrapy上使用Python的内置日志记录。</p>
</dd>
<dt><a class="reference internal" href="topics/stats.html"><span class="doc">统计数据集合</span></a></dt><dd><p>收集关于您的 Scrape 爬虫的统计数据。</p>
</dd>
<dt><a class="reference internal" href="topics/email.html"><span class="doc">发送电子邮件</span></a></dt><dd><p>发生某些事件时发送电子邮件通知。</p>
</dd>
<dt><a class="reference internal" href="topics/telnetconsole.html"><span class="doc">远程登录控制台</span></a></dt><dd><p>使用内置的python控制台检查正在运行的爬虫程序。</p>
</dd>
<dt><a class="reference internal" href="topics/webservice.html"><span class="doc">Web服务</span></a></dt><dd><p>使用Web服务监视和控制爬虫程序。</p>
</dd>
</dl>
</div>
<div class="section" id="solving-specific-problems">
<h2>解决具体问题<a class="headerlink" href="#solving-specific-problems" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="faq.html"><span class="doc">常见问题</span></a></dt><dd><p>获取最常见问题的答案。</p>
</dd>
<dt><a class="reference internal" href="topics/debug.html"><span class="doc">调试spiders</span></a></dt><dd><p>学习如何调试你的废蜘蛛的常见问题。</p>
</dd>
<dt><a class="reference internal" href="topics/contracts.html"><span class="doc">蜘蛛合约</span></a></dt><dd><p>学习如何使用联系来测试你的蜘蛛。</p>
</dd>
<dt><a class="reference internal" href="topics/practices.html"><span class="doc">常用做法</span></a></dt><dd><p>熟悉一些 Scrapy 惯例。</p>
</dd>
<dt><a class="reference internal" href="topics/broad-crawls.html"><span class="doc">宽爬行</span></a></dt><dd><p>调整Scrapy以并行地爬行许多域。</p>
</dd>
<dt><a class="reference internal" href="topics/developer-tools.html"><span class="doc">使用浏览器的开发人员工具进行抓取</span></a></dt><dd><p>了解如何使用浏览器的开发人员工具。</p>
</dd>
<dt><a class="reference internal" href="topics/dynamic-content.html"><span class="doc">选择动态加载的内容</span></a></dt><dd><p>读取动态加载的网页数据。</p>
</dd>
<dt><a class="reference internal" href="topics/leaks.html"><span class="doc">调试内存泄漏</span></a></dt><dd><p>学习如何发现并消除爬行器中的内存泄漏。</p>
</dd>
<dt><a class="reference internal" href="topics/media-pipeline.html"><span class="doc">下载和处理文件和图像</span></a></dt><dd><p>下载与抓取项目相关的文件和/或图像。</p>
</dd>
<dt><a class="reference internal" href="topics/deploy.html"><span class="doc">部署蜘蛛</span></a></dt><dd><p>部署  Scrapy   蜘蛛并在远程服务器中运行它们。</p>
</dd>
<dt><a class="reference internal" href="topics/autothrottle.html"><span class="doc">AutoThrottle 扩展</span></a></dt><dd><p>根据负载动态调整爬行速率。</p>
</dd>
<dt><a class="reference internal" href="topics/benchmarking.html"><span class="doc">标杆管理</span></a></dt><dd><p>检查Scrapy在硬件上的性能。</p>
</dd>
<dt><a class="reference internal" href="topics/jobs.html"><span class="doc">作业：暂停和恢复爬行</span></a></dt><dd><p>学习如何暂停和恢复大型蜘蛛的爬行。</p>
</dd>
<dt><a class="reference internal" href="topics/coroutines.html"><span class="doc">协同程序</span></a></dt><dd><p>使用 <a class="reference external" href="https://docs.python.org/3/reference/compound_stmts.html#async" title="(在 Python v3.9)"><span class="xref std std-ref">coroutine syntax</span></a> .</p>
</dd>
<dt><a class="reference internal" href="topics/asyncio.html"><span class="doc">asyncio</span></a></dt><dd><p>使用 <a class="reference external" href="https://docs.python.org/3/library/asyncio.html#module-asyncio" title="(在 Python v3.9)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">asyncio</span></code></a> 和 <a class="reference external" href="https://docs.python.org/3/library/asyncio.html#module-asyncio" title="(在 Python v3.9)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">asyncio</span></code></a> -动力库。</p>
</dd>
</dl>
</div>
<div class="section" id="extending-scrapy">
<span id="id1"></span><h2>扩展Scrapy<a class="headerlink" href="#extending-scrapy" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="topics/architecture.html"><span class="doc">体系结构概述</span></a></dt><dd><p>了解 Scrapy 构造。</p>
</dd>
<dt><a class="reference internal" href="topics/downloader-middleware.html"><span class="doc">下载器中间件</span></a></dt><dd><p>自定义请求和下载页面的方式。</p>
</dd>
<dt><a class="reference internal" href="topics/spider-middleware.html"><span class="doc">蜘蛛中间件</span></a></dt><dd><p>自定义蜘蛛的输入和输出。</p>
</dd>
<dt><a class="reference internal" href="topics/extensions.html"><span class="doc">扩展</span></a></dt><dd><p>使用自定义功能扩展scrapy</p>
</dd>
<dt><a class="reference internal" href="topics/api.html"><span class="doc">核心API</span></a></dt><dd><p>在扩展和中间软件上使用它来扩展 Scrapy 功能</p>
</dd>
<dt><a class="reference internal" href="topics/signals.html"><span class="doc">信号</span></a></dt><dd><p>查看所有可用信号以及如何使用它们。</p>
</dd>
<dt><a class="reference internal" href="topics/exporters.html"><span class="doc">条目导出器</span></a></dt><dd><p>快速将已删除的项目导出到文件（XML，CSV等）。</p>
</dd>
</dl>
</div>
<div class="section" id="all-the-rest">
<h2>其余所有<a class="headerlink" href="#all-the-rest" title="永久链接至标题">¶</a></h2>
<div class="toctree-wrapper compound">
</div>
<dl class="simple">
<dt><a class="reference internal" href="news.html"><span class="doc">发行说明</span></a></dt><dd><p>看看最近的Scrapy版本有什么变化。</p>
</dd>
<dt><a class="reference internal" href="contributing.html"><span class="doc">为 Scrapy 贡献</span></a></dt><dd><p>学习如何为 Scrapy 项目做出贡献。</p>
</dd>
<dt><a class="reference internal" href="versioning.html"><span class="doc">版本控制和API稳定性</span></a></dt><dd><p>了解Scrapy版本和API稳定性。</p>
</dd>
</dl>
</div>
</div>


           </div>
           
          </div>
          <footer>
  
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
      
        <a href="intro/overview.html" class="btn btn-neutral float-right" title="Scrapy一目了然" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
      
      
    </div>
  

  <hr/>

  <div role="contentinfo">
    <p>
        
        &copy; 版权所有 2008–2020, Scrapy developers
      <span class="lastupdated">
        最后更新于 10月 18, 2020.
      </span>

    </p>
  </div>
    
    
    
    Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a
    
    <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a>
    
    provided by <a href="https://readthedocs.org">Read the Docs</a>. 

</footer>

        </div>
      </div>

    </section>

  </div>
  

  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>

  
  
    
  
 
<script type="text/javascript">
!function(){var analytics=window.analytics=window.analytics||[];if(!analytics.initialize)if(analytics.invoked)window.console&&console.error&&console.error("Segment snippet included twice.");else{analytics.invoked=!0;analytics.methods=["trackSubmit","trackClick","trackLink","trackForm","pageview","identify","reset","group","track","ready","alias","page","once","off","on"];analytics.factory=function(t){return function(){var e=Array.prototype.slice.call(arguments);e.unshift(t);analytics.push(e);return analytics}};for(var t=0;t<analytics.methods.length;t++){var e=analytics.methods[t];analytics[e]=analytics.factory(e)}analytics.load=function(t){var e=document.createElement("script");e.type="text/javascript";e.async=!0;e.src=("https:"===document.location.protocol?"https://":"http://")+"cdn.segment.com/analytics.js/v1/"+t+"/analytics.min.js";var n=document.getElementsByTagName("script")[0];n.parentNode.insertBefore(e,n)};analytics.SNIPPET_VERSION="3.1.0";
analytics.load("8UDQfnf3cyFSTsM4YANnW5sXmgZVILbA");
analytics.page();
}}();

analytics.ready(function () {
    ga('require', 'linker');
    ga('linker:autoLink', ['scrapinghub.com', 'crawlera.com']);
});
</script>


</body>
</html>